Code
library(tidyverse)
#tidyverse_update()
library(nycflights13)
library(gapminder)
library(Lahman)
Tony Duan
November 10, 2022
https://r4ds.had.co.nz/
R for Data Science by Hadley Wickham & Garrett Grolemund
ggplot2 implements the grammar of graphics, a coherent system for describing and building graphs. With ggplot2, you can do more faster by learning one system and applying it in many places.
To plot mpg, run this code to put displ on the x-axis and hwy on the y-axis:
map the colors of your points to the class variable to reveal the class of each car.
assignment statements
Calling functions
Rows: 336,776
Columns: 19
$ year <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2…
$ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ dep_time <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558, …
$ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600, …
$ dep_delay <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -1…
$ arr_time <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849,…
$ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851,…
$ arr_delay <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -1…
$ carrier <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6", "…
$ flight <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, 4…
$ tailnum <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N394…
$ origin <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LGA",…
$ dest <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IAD",…
$ air_time <dbl> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, 1…
$ distance <dbl> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733, …
$ hour <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6…
$ minute <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0…
$ time_hour <dttm> 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 0…
int
stands for integers.
dbl
stands for doubles, or real numbers.
chr
stands for character vectors, or strings.
dttm
stands for date-times (a date + a time).
lgl
stands for logical, vectors that contain only TRUE or FALSE.
fctr
stands for factors, which R uses to represent categorical variables with fixed possible values.
date
stands for dates.
# A tibble: 842 × 19
year month day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
1 2013 1 1 517 515 2 830 819 11 UA
2 2013 1 1 533 529 4 850 830 20 UA
3 2013 1 1 542 540 2 923 850 33 AA
4 2013 1 1 544 545 -1 1004 1022 -18 B6
5 2013 1 1 554 600 -6 812 837 -25 DL
6 2013 1 1 554 558 -4 740 728 12 UA
7 2013 1 1 555 600 -5 913 854 19 B6
8 2013 1 1 557 600 -3 709 723 -14 EV
9 2013 1 1 557 600 -3 838 846 -8 B6
10 2013 1 1 558 600 -2 753 745 8 AA
# … with 832 more rows, 9 more variables: flight <int>, tailnum <chr>,
# origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
# minute <dbl>, time_hour <dttm>, and abbreviated variable names
# ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
# A tibble: 55,403 × 19
year month day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
1 2013 11 1 5 2359 6 352 345 7 B6
2 2013 11 1 35 2250 105 123 2356 87 B6
3 2013 11 1 455 500 -5 641 651 -10 US
4 2013 11 1 539 545 -6 856 827 29 UA
5 2013 11 1 542 545 -3 831 855 -24 AA
6 2013 11 1 549 600 -11 912 923 -11 UA
7 2013 11 1 550 600 -10 705 659 6 US
8 2013 11 1 554 600 -6 659 701 -2 US
9 2013 11 1 554 600 -6 826 827 -1 DL
10 2013 11 1 554 600 -6 749 751 -2 DL
# … with 55,393 more rows, 9 more variables: flight <int>, tailnum <chr>,
# origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
# minute <dbl>, time_hour <dttm>, and abbreviated variable names
# ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
# A tibble: 55,403 × 19
year month day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
1 2013 11 1 5 2359 6 352 345 7 B6
2 2013 11 1 35 2250 105 123 2356 87 B6
3 2013 11 1 455 500 -5 641 651 -10 US
4 2013 11 1 539 545 -6 856 827 29 UA
5 2013 11 1 542 545 -3 831 855 -24 AA
6 2013 11 1 549 600 -11 912 923 -11 UA
7 2013 11 1 550 600 -10 705 659 6 US
8 2013 11 1 554 600 -6 659 701 -2 US
9 2013 11 1 554 600 -6 826 827 -1 DL
10 2013 11 1 554 600 -6 749 751 -2 DL
# … with 55,393 more rows, 9 more variables: flight <int>, tailnum <chr>,
# origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
# minute <dbl>, time_hour <dttm>, and abbreviated variable names
# ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
# A tibble: 281,373 × 19
year month day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
1 2013 1 1 517 515 2 830 819 11 UA
2 2013 1 1 533 529 4 850 830 20 UA
3 2013 1 1 542 540 2 923 850 33 AA
4 2013 1 1 544 545 -1 1004 1022 -18 B6
5 2013 1 1 554 600 -6 812 837 -25 DL
6 2013 1 1 554 558 -4 740 728 12 UA
7 2013 1 1 555 600 -5 913 854 19 B6
8 2013 1 1 557 600 -3 709 723 -14 EV
9 2013 1 1 557 600 -3 838 846 -8 B6
10 2013 1 1 558 600 -2 753 745 8 AA
# … with 281,363 more rows, 9 more variables: flight <int>, tailnum <chr>,
# origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
# minute <dbl>, time_hour <dttm>, and abbreviated variable names
# ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
Missing values
[Book]ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham https://ggplot2-book.org/index.html
---
title: "【Reading】R for Data Science Part 1 C1-C16"
author: "Tony Duan"
date: "2022-11-08"
categories: [Book]
execute:
warning: false
error: false
format:
html:
toc: true
code-fold: show
code-tools: true
---
https://r4ds.had.co.nz/
R for Data Science by Hadley Wickham & Garrett Grolemund

## 1-2 Introduction

```{r}
library(tidyverse)
#tidyverse_update()
library(nycflights13)
library(gapminder)
library(Lahman)
```
```{r}
1+2
```
## 3 Data visualisation
ggplot2 implements the grammar of graphics, a coherent system for describing and building graphs. With ggplot2, you can do more faster by learning one system and applying it in many places.
### Creating a ggplot
To plot mpg, run this code to put displ on the x-axis and hwy on the y-axis:
```{r}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```
map the colors of your points to the class variable to reveal the class of each car.
```{r}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
```
## 4 Workflow: basics
assignment statements
```{r}
a=1
a
```
Calling functions
```{r}
seq(1, 10)
```
## 5 Data transformation
```{r}
library(nycflights13)
library(tidyverse)
```
```{r}
glimpse(flights)
```
`int` stands for integers.
`dbl` stands for doubles, or real numbers.
`chr` stands for character vectors, or strings.
`dttm` stands for date-times (a date + a time).
`lgl` stands for logical, vectors that contain only TRUE or FALSE.
`fctr` stands for factors, which R uses to represent categorical variables with fixed possible values.
`date` stands for dates.
### filter row:
```{r}
filter(flights, month == 1, day == 1)
```
### Comparisons:
```{r}
1 == 1
```
### or:
```{r}
filter(flights, month == 11 | month == 12)
```
### in:
```{r}
filter(flights, month %in% c(11, 12))
```
### not in:
```{r}
filter(flights, !month %in% c(11, 12))
```
Missing values
## 6 Workflow: scripts
## 7 Exploratory Data Analysis
## 8 Workflow: projects
## 9 Introduction
## 10 Tibbles
## 11 Data import
## 12 Tidy data
## 13 Relational data
## 14 Strings
## 15 Factors
## 16 Dates and times
## Reference
\[Book\]ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham https://ggplot2-book.org/index.html